Skip to content
This repository has been archived by the owner on Nov 1, 2023. It is now read-only.

always retry on specific errors from azcopy #1196

Merged

Conversation

bmc-msft
Copy link
Contributor

As indicated in #1195, there isn't an ergonomic mechanism for fuzzers & the agent to share knowledge of when an azcopy sync is occurring. As such, errors that are indicative of this difficulty should be retried automatically without impacting the "how many times to retry on error" counter.

@bmc-msft bmc-msft linked an issue Aug 27, 2021 that may be closed by this pull request
src/agent/onefuzz/src/az_copy.rs Show resolved Hide resolved
src/agent/onefuzz/src/az_copy.rs Show resolved Hide resolved
if attempt_count >= RETRY_COUNT {
Err(backoff::Error::Permanent(x))
Err(err) => {
if !should_retry_without_attempt_increment(&err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We unconditionally increment attempt_count above, and this function decides whether or not we increment failure_count. I'm guessing the names drifted during dev.

What if we just rename the predicate to should_always_retry(), which matches the constant, is a non-negated phrasing of a boolean, and lets the caller interpret appropriately (without attempt/failure baggage).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change, the # of attempts and # of failures were the same.

Originally when developing this, I used a single counter. However, in considering the information given to the user, I thought it useful to track these as different values to understand this difference.

Consider a single counter, where the source changes during upload (triggering the azcopy failure) 3 times.

azcopy sync attempt 1 failed
azcopy sync attempt 1 failed
azcopy sync attempt 1 failed

With two counters, this would look like:

azcopy sync attempt 1 failed (failure 1)
azcopy sync attempt 2 failed (failure 1)
azcopy sync attempt 3 failed (failure 1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the change, and it seems really reasonable.

My suggestion is just that the predicate name is now confusing, since (as your example illustrates), we always increment attempt, but only increment failure conditionally (depending on if the error has the "always retry" status).

When I wrote "without attempt/failure baggage", I just meant "in the name of the predicate function". We should keep that distinction.

@bmc-msft bmc-msft requested a review from ranweiler August 27, 2021 16:25
@bmc-msft bmc-msft enabled auto-merge (squash) August 27, 2021 17:15
@bmc-msft bmc-msft merged commit 4e30eea into microsoft:main Aug 27, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Sep 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task fails if file updated while being uploaded to container via sync
3 participants